home *** CD-ROM | disk | FTP | other *** search
- Path: ifi.uio.no!usenet
- From: ludvigp@ifi.uio.no (Ludvig Pedersen)
- Newsgroups: comp.sys.amiga.programmer
- Subject: Re: doubling pixels horizontally
- Date: 6 Mar 1996 18:52:33 GMT
- Organization: Dept. of Informatics, University of Oslo, Norway
- Message-ID: <5257.6639T1152T2935@ifi.uio.no>
- References: <4f4ibc$gl9@news.cs.tu-berlin.de> <591.6610T1165T2102@login.eunet.no><1045.6611T753T2256@vip.cybercity.dk><4faoe1$47@sunsystem5.informatik.tu-muenchen.de><2991.6612T1034T625@vip.cybercity.dk><576.6613T1070T1730@login.eunet.no><1257.6614T57T922@vip.cybercity.dk>
- <1982.6617T1096T103@ifi.uio.no> <4gbjg3$104@sunsystem5.informatik.tu-muenchen.de>
- <4518.6625T1142T92@ifi.uio.no> <4h4hv5$mnn@sunsystem5.informatik.tu-muenchen.de>
- <2444.6635T982T1557@ifi.uio.no> <4hhjlv$5qb@sunsystem5.informatik.tu-muenchen.de>
- NNTP-Posting-Host: gymir.ifi.uio.no
- X-Newsreader: THOR 2.22 (Amiga;TCP/IP)
-
- >|> >Maybe I'm the only one but I can read/build code much better that way.
- >|> >And I could read my code well :)
- >|> WOW!...do you have any super-natural powers? ;^)
- >grrrrrrrrrrrrrrr :) no that's just a subjective thing.
-
- hehehe....sorry about that! (grin) I just couldnt resist. :-)
-
- >Code gets more structured (grrr don't laugh ;) and though more overview.
- >Puting instructions next to each other which are related to each other
- >subq.w #1,a5 : cmp.w #0,a5 : bne loop ;out of data registers
- >just one structure level more, code gets 2 dimensional (blah:) like
- >in C code. And as asm needs more instructions than C, it needs the
- >2 dimensional format even more if you don't wanna lose overwiev. baeh!
- >:)
-
- Asm was never design for it, and I don't think it looks good either.
-
- >|> >|> On my A1200 7mb/sec is not copy speed but chip write speed.
- >|> >mhm, all people told me the blizzard will _copy_ 7mb/sec.
- >|> >a myth ?
- >|> I think so. But please show me the copy-loop and I'll test it.
- >could you please try movem.l (fast)+,d0-d7 and then 8 times move.l dn,(chip)+
- >?
-
- I did tried a LOT of different loops and here is a small collection of
- the top 5 loops I tried. Acutally the result was a little better than I
- thought.
-
- ALL DMA IS OFF!
-
-
- ;Speed: 5.640 MB/s
-
- move.l (a0)+,d0
- move.l (a0)+,d1
- move.l (a0)+,d2
- move.l (a0)+,d3
- move.l (a0)+,d4
- move.l (a0)+,d5
- move.l (a0)+,d6
- move.l (a0)+,a2
- move.l (a0)+,a3
- move.l (a0)+,a4
- move.l (a0)+,a5
- move.l (a0)+,a6
- move.l d0,(a1)+
- move.l d1,(a1)+
- move.l d2,(a1)+
- move.l d3,(a1)+
- move.l d4,(a1)+
- move.l d5,(a1)+
- move.l d6,(a1)+
- move.l a2,(a1)+
- move.l a3,(a1)+
- move.l a4,(a1)+
- move.l a5,(a1)+
- move.l a6,(a1)+
-
-
-
- ; Speed: 5.472 MB/s
-
- movem.l (a0)+,d0-d6/a2-a6
- move.l d0,(a1)+
- move.l d1,(a1)+
- move.l d2,(a1)+
- move.l d3,(a1)+
- move.l d4,(a1)+
- move.l d5,(a1)+
- move.l d6,(a1)+
- move.l a2,(a1)+
- move.l a3,(a1)+
- move.l a4,(a1)+
- move.l a5,(a1)+
- move.l a6,(a1)+
-
-
-
- ; Speed: 5.472 MB/s
-
- movem.l (a0)+,d0-d6/a2-a6
- movem.l d0-d6/a2-a6,-(a1)
-
-
-
- ; Speed: 4.896 MB/s
-
- rept 16
- move.l (a0)+,(a1)+
- endr
- dbra d7,.loop
-
-
-
- ; Speed: 4.656 MB/s
-
- rept 16
- move.l (a0)+,d0
- move.l d0,(a1)+
- endr
-
-
- >imho this should do 7mb/sec in the store part. if the movem
- >is very fast, you aproximate the 7mb/sec also doing copying.
- 7 Mb/s is not possible. Remember that you have to access the same data-bus to
- read from FastRam.
-
- >On 020-14 it will be slower than normal copy, on 020-28 maybe already
- >faster (only theory!)
- >so we still need a test if it's faster than move.l (fast)+,(chip)+
- >|> Here is my results from bustest:
- >|>
- >|> BusSpeedTest 0.07 (mlelstv) Buffer: 16384 Bytes
- >|> ==================================================
- >|> loop overhead: 4.5ns
- >|> register move: 40.6ns
-
- >huh ? a register move is 2 cycles. you got 24.63 MHz ?
-
- Ehh..No, I have 50 mhz.
-
- I tested it myself. (just to be sure)
-
- I was able to do 24400000 register move's and 203300 dbra's per second.
-
- A dbra is 3 times slower than a register move so that's 25.0 peek MIPS.
-
- 1.000.000.000 ns / 25.009.900 = 39.98 ns
-
-
- Check you numbers, its correct!
-
- >|> memtype op cycle bandwidth
- >|> fast readw 109.1ns 18.3MByte/s
- >|> fast readl 137.6ns 29.1MByte/s
- >|> fast readm 167.7ns 23.8MByte/s
-
- >readm slower ? hmhmhmhm. nooo.
- Ohhh-yes.. Just look at the copy results.
-
- >if you use enough regs it's faster on 020-14.
- >also reading from chip is faster with readm... mhmhm
-
- >|> Please not that this is write-speed and NOT copy speed.
- >|> >|> I did a simple test and I was able to copy 4.9mb/sec from fastram to
- >|> >|> chipram on a 256 colors screen.
- >|> >what size ? overscan, 320x256,320x200 ? pal ?
- >|> PAL-lowres, no overscan.
- >maybe you can write in 10 14-mhz cycle parts, i.e 5.672mb/sec theoretic
- >if no dma at all.
- Yes, write 5.6 MB/s to ChipRam with no DMA.
-
- >|> >|> I don't get it??? ;) It is optimal. That was actually VERY easy.
- >|> >|> Remeber that we are taking about 2x2 without sprite-dithering!!
- >|> >either you misuse a plane as mask, so 128 colors only, or the 2x2
- >|> >routine is slower than 3pass, i.e. not optimal :)
- >|> The blitter uses only 2 passes and *is* optimal. And its 256 colors.
-
- >hehe, 2 passes blitter is slower than 1 pass blitter ;)
- >there's a misunderstanding in the word "optimal".
- >while there realy might be no way to do faster the way you do,
- >there's another way to do it :) the other way has disadvantages
- >(monitor sideefects ;) though.
-
- We support both 2xN sprite-dithering (1 pass) and normal 2xN (2 pass).
-
- If your render-routine is 25 fps or slower using the 2 pass version
- doesnt matter at all in speed and framerate. You only get a better-looking
- display.
-
- Can you explain about that monitor side-effects stuff you are talking about.
- Is this something new?
-
-
-
- <sb>Ludde - Amiga Demo Coder
- <sb>Virtual Reality & Official Be developer
- <sb>ludvigp@ifi.uio.no
-
-